Method and system for problem determination using probe collections and problem classification for the technical support services

ABSTRACT

A system and method for problem determination using probe collections and problem classification for the technical support services monitor and collect data associated with a computer system, raise an alarm based on the monitored and collected data, probe the computer system for additional information, filter the monitored and collected data based on the additional information established from probing, and use the filtered data to label a problem associated with the raised alarm.

BACKGROUND

The present disclosure relates generally to computer systems and servicetechnologies, and more particularly to problem determination using probecollections and problem classification for the technical supportservices.

An effective problem determination and resolution (PDR) process cancontribute to a substantial reduction in technical support servicescosts. PDR is the process of detecting anomalies in a monitored system,locating the problems responsible for the issue, determining the rootcause and fixing the cause of the problem. Thus, once the user (customeror technical personnel) detects a problem, he first tries to identifythe type of problem in order to search for the relevant fix. Especiallyin case of software problems in multi-tier information technology (IT)environment with complex system dependencies, the user may experience afront-end issue caused by a back-end problem. Thus, the problem may beonly the effect of an underlying issue within the IT environment and, onone hand, the fixes found may not address the root cause, while on theother hand, the root cause may be buried in large amounts of logs,traces, and monitoring data from healthy resources involved in thepropagated failure. Analyzing all the available logs and monitoring datais time consuming and error prone, therefore the PDR process wouldbenefit from filtering the data related to the failing resource.

An example of a multi-tier environment is an e-business system which issupported by an infrastructure including for example the followingsubsystems connected by local and wide area networks: web basedpresentation services, access services, application business logic,messaging services, database services and storage subsystems. Theexisting solutions that provide problem determination are limited inthat they are problem specific and as such lack the potential of beingapplied to wider type of issues. Other known methodologies use only onetype of the available information, overlooking or ignoring informationthat may be relevant to the problem at hand. Yet other known methodsprovide particular approaches to the PDR technology that is applicablein specific scenarios only.

BRIEF SUMMARY

A method and system for problem determination using probe collectionsand problem classification for the technical support services areprovided. The method, in one aspect, may include monitoring andcollecting, by a processor, data associated with a running computersystem and raising an alarm, automatically by the processor, based onthe monitored and collected data. The method may further include probingthe computer system for additional information associated with thealarm; and filtering the monitored and collected data based on theadditional information established from probing. The method may alsoinclude using the filtered data to label a problem associated with theraised alarm.

A system for problem determination using probe collections and problemclassification for the technical support services, in one aspect, mayinclude a monitoring and data collection processing module operable tomonitor and collect data associated with a computer system, themonitoring and data collection module further operable to raise an alarmbased on the monitored and collected data. The system may also include aprobe platform operable to probe the computer system for additionalinformation, the probe platform further operable to filter the monitoredand collected data based on the additional information established fromprobing. The system may further include a classifier module operable touse the filtered data and automatically label a problem associated withthe raised alarm.

A program storage device readable by a machine, tangibly embodying aprogram of instructions executable by the machine to perform methodsdescribed herein may be also provided.

Further features as well as the structure and operation of variousembodiments are described in detail below with reference to theaccompanying drawings. In the drawings, like reference numbers indicateidentical or functionally similar elements.

BRIEF DESCRIPTION OF THE SEVERAL VIEWS OF THE DRAWINGS

FIG. 1 illustrates a problem determination process using probecollections and problem classification for the technical supportservices in one embodiment of the present disclosure.

FIG. 2 is a flow diagram illustrating a method of the present disclosurein one embodiment.

FIG. 3 illustrates an example a computer system which may carry out orexecute the systems and methodologies of the present disclosure in oneembodiment.

FIGS. 4-6 illustrate examples of probes of the present disclosure.

DETAILED DESCRIPTION

In one embodiment, the system and method of the present disclosureeffectively localize IT problems by appropriately probing the failingenvironment and categorizing the IT resources in view of filtering theavailable data such as, but not limited to, performance data, resourcesconsumption data, logs data, for more focused or targeted problemdetermination.

The system and method of the present disclosure may include processesand algorithms to:

-   -   Probe the failing environment thus localizing the problem and        filtering out the logs and monitoring data relevant to the        failing resource from the data generated by resources under        failure propagation;    -   Label the problem (e.g., the problem ticket) and the related log        and monitoring data with the real underlying problem that        occurred when the data was collected;    -   Learn from historical labeled data the patterns specific to the        existing problem taxonomy;    -   Recognize problems when given a new set of log and monitoring        data based on the patterns learned at c).    -   Enrich the probing repository with adding executable probing        algorithms and/or plans by the community to share the experience        across customer environment and problem resolutions.

The system and method of the present disclosure may 1) enhance theproblem determination efficiency by combing appropriately differentproblem determination technologies instead of focusing on a particularone; and, 2) enable the user to identify the root cause so that the fixis searched based on the cause rather than on its propagated effects.This may result in operational cost savings. A more focused, filteredset of data increases the accuracy of the problem identification. If theproblem results in a service request submission to the technicalsupport, the problem routing can benefit from this initial problemclassification instead of symptom based routing. This reduces the riskof re-routing the request when the symptom is a remote effect of a causelocated in a completely different resource.

In one embodiment, we build a framework that brings the principles ofService Oriented Architecture (SOA) and information integration toautomate problem determination activities. The framework may reduce thetime to localize a problem. Following SOA principles, various reusableactivities are modeled as services that are further composed to offerhigher level services. Examples of such services may include:

-   -   Probe execution services (e.g., a script execution service whose        inputs are a script, the values of associated parameters and the        execution environment);    -   Output analysis services (e.g., a service that parses the text        output of a probe execution service and creates output in the        form of structured data, given mapping rules, possibly based on        regular expressions);    -   Information integration services (e.g., a service that provides        values to variables given the mapping from query to information        stores such as CMDB).

A diagram of problem determination process using probe collections andproblem classification according to a method in one embodiment of thepresent disclosure is shown in FIG. 1. Each item and the flow ofinformation between them during the operational phase of the system aredescribed below.

The customer or user 102 may be experiencing a problem in their ITenvironment, for example an IT multi-tier distributed environment 104.An example of such a multi-tier environment is an e-business systemwhich is supported by an infrastructure including the subsystems forexample that may be connected by local and wide area networks, such asthe following: web based presentation services, access services,application business logic, messaging services, database services andstorage subsystems.

A monitor and data collector module 108 may monitor the customer's ITenvironment 104, for example, by running a process 106, and collectingperiodically data such as, but not limited to, performance, resourceconsumption, logs data. A monitor and data collector module 108 may be amonitoring tool deployed for the customer's IT environment, for example,including monitoring server, agents, probes, data warehouses, etc. Suchmonitoring may employ tools such as NetSol™, ITM™, Director™.

A problem notification or an alarm 114 may originate from the user. Aproblem notification or an alarm 110, 112 also may be generated from themonitoring tool, for example, monitor and data collector 108. Forinstance, if a monitored data exceeds a predetermined criteria orthreshold, the monitoring tool 108 may raise an alarm. The monitoringtool 108 also may provide data associated with monitoring shown at 116and 124. This data may include, but is not limited to, performance data,resource utilization data, inventory, logs, traces. The type of dataprovided may depend on the type of monitoring tool employed in themonitoring tool 108. Data is provided for example, to End-to-end ProbingPlatform (EPP) also referred to as a probing plan execution platform118, Labeled Data Repository 120 and Classifier module 122.

The monitored data 116 may be filtered, for example, by the monitor anddata collector module 108, and sent to Problem Determination Module 128as shown at 126, to Labeled Data Repository 120 shown at 124, and toClassifier module 112 as shown at 150. This data includes data filteredfrom the data at 116, i.e., data collected by the monitoring tool 108,by using the information on the failing resource(s) also collected atthe monitoring tool 108. All the data related to well behaving resourcesis filtered out.

A probing plan execution platform 118 may include, for example, aworkflow engine such as a Business Process Execution Language (BPEL)engine, a probe descriptor interpreter, connectors to the configurationand monitoring databases such as configuration management database(CMDB) 132, connectors to the managed systems such as ssh client, httpclient, java rmi client, and web services client, various types of probelaunchers and result analyzers including script executer, text parsersfor script output, logs, traces data, etc., and web service responseanalyzer. Based on the received alarm type 110 and monitoring data 116,logs, etc., the platform 118 decides on and executes the workflow in theprobing plan.

Examples of probes may include Telnet and Ping. Telnetting to a databaseserver port or invoking an operation of a Java Management Extensions(JMX) MBean of an Application Server are examples of system probes. Forinstance, the Telnet probe can be used to check whether a databaseserver is running, while the JMX probe can be used to check whether theconnections held by the Application Server are usable.

Low level services may be grouped in a collection of probes based ondifferent ways of launching the probes and analyzing the outputs, suchas invoking a web service, executing a database query, enabling ARM ormonitoring transaction paths. For instance, the following illustrates alow level service template that executes an OS “command” and produces“textOutput” that contains the output of the command:

-   -   OSCommand Execution Service    -   Input—String: command    -   Output—String: textOutput

This service executes the command on its hosting machine in oneembodiment. An advanced service for similar purpose may use host,sshUser and sshPassword as other possible inputs for executing thecommand remotely via ssh.

As another example the following illustrates a low level servicetemplate that creates an instance of a java class, namely javaClassNameand populates its attributes using the regular expression basedfunctions applied on text. attribute_RegEx, which contains the mappingof attribute names to such functions:

-   -   Simple Text Analysis Service    -   Input—String: text        -   String: javaClassName        -   Map: attribute_REgEX    -   Ouput—Object: output

A simple system/network level probe template, namely Telnet, may becomposed of lower level services. The probe can take two inputparameters ip and port and produce a Boolean output, namelyserviceRunning, that shows whether a process is listening on the givenport. An example flow for such probe is shown in FIG. 4.

Another probe may be built that abstracts the Telnet probe at higherlevel for easily using it to check whether a database server is running.For example, a high level probe template namely, IsDBServerUP, may becomposed of Telnet Probe and CMDB Query Service. This probe may providea higher level of abstraction by taking DBServerID (unique ID of aDBServer in the CMDB) as input and producing boolean output namelyisDBServerUP that shows whether the DBServer is running fine. FIG. 5illustrates an example of this probe. Other advance implementations ofsimilar service may involve more rigorous probing e.g., issuing adatabase query.

A composition of probes that includes control flow logic as well mayprovide the higher level probe referred to as a probing or probe plan.For instance, a simple probe template may operate at the applicationlevel and encapsulate a probing plan. A probing plan may includecomposite probes/services and control flow, data transformations andflow, with decision control flow. FIG. 6 illustrates an example of aprobe template that operates at an application level and encapsulates aprobing plan. For instance, services shown in FIG. 6, e.g., Service1,Service2, Service3, Service4, Service5, Service6 represent probesincorporated in the control flow of FIG. 6. The flow shown in FIG. 6omits the details of data transformations and flow, for simplicity. Thedecision box 614 illustrates the handling of variation in the topologyof the solution, based on information available in CMDB.

The low level services described above can be grouped in a collection ofprobes based on different ways of launching the probes and analyzing theoutputs such as invoking a web service, executing a database query,enabling ARM or monitoring transaction paths. FIG. 4 illustrates asimple system/network level probe template, namely Telnet, while FIG. 5illustrates another probe that abstracts the Telnet probe at higherlevel for easily using it to check whether a database server is running.When a composition of probes includes a control flow logic as well, thehigher level probe is referred to as a probe plan. One such probe planis illustrated in FIG. 6. The probe plans may be generated in thatmanner in the present disclosure to create repetitive problemdetermination processes.

In FIG. 1, the operational architecture for probes may comprise probedescriptors 140, probing 134, probe platform 118, CMDB dependencies 132,PD plans 136, Query 130. The community 138 can contribute with variouslow level services, simple probes, composite probes or probing plans toa shared repository called PD Plans 136. The framework of the presentdisclosure also may enable experts who develop various PD plans in theirlabs, to contribute and share the PD plans for reuse in any data centeror remote management services environment (e.g., IBM RemoteInfrastructure Management Services (RIMS)) without requiring any changeto the PD plan template. The Probe Platform 118 may provide the run-timeservices for executing probe templates in a given context. In the ProbePlatform 118, the PD plan 136 gets customized with the CMDB information.

During the workflow execution, when the platform 118 needs to execute aprobe expressed by a generic descriptor (e.g., probe template), theplatform 118 instantiates the values of the execution host and the probeparameters by querying (as shown at 130) the CMDB 132 related to thatparticular IT environment 104 (e.g., specified in the probe descriptor).Based on the probe results 134 received back from the IT environment104, the probing plan execution platform 118 filters out the anomalousreplies from the flawless ones and thus better localizes the failingresource(s). The data 126 related to the targeted failing resource(s) issent to the problem classifier 150 for further analysis.

A database, a file or a collection of file 132 (e.g., CMDB) contains theinformation about the environment being managed. This informationincludes, but is not limited to, details of managed entities andstructural relationships among those. Another database, a file or acollection of file 136 may store the probing algorithms and/or plans.The probing plans range from general purpose plans to services forspecific plans. In one embodiment, the plans are environmentindependent.

The probe platform 118 gathers data by querying (130) the probedescriptor (PD) plan 136 and CMDB 132. The probe platform 118 uses thePD plan data and data from CMDB 132 to instantiates its probes.

The probing algorithms and/or plans may be developed and/or updated by acommunity of users. For instance, the community 138 may develop andconsume various probing plans using authoring tooling shown at 140. Thecommunity 138 may be users or experts that utilize various systemsand/or IT environment. The community shares in this way the experiencegained across customer environments and problem resolutions, andpopulates at 140 the probing algorithms and/or plans repository 136.

As discussed above, authoring toolings or tools 140 may be used for thecreation of probing plans. The tooling 140 may provide the environmentto express the flow of a probing plan with its specific steps. Thesesteps include but are not limited to, (i) invoking another probing plan,(ii) gathering environment configuration data from CMDB likerepositories for just-in-time customization of the probing plan, as wellas of related tools and scripts, (iii) data gathering from these toolsand running scripts. The semantics of how to launch a probe and how tointerpret its results are specified by an expert. This semanticdescription is referred to as a probe descriptor. Experts associate aprobe descriptor with a particular resource or relationship type definedin CMDB schema. Few items in the probe descriptor are concrete, e.g.,what script to execute and what parameters it has. Further, few of thedescription items are environment specific and hence, cannot bespecified concretely in the probe descriptor. These items are specifiedin terms of graph paths on the CMDB object model starting from theassociated model element e.g., how to derive a host machine, where toexecute the script specified in the probe descriptor, how to compute theparameter values, etc.

Shown at 134 is an example of the process of probing the managedenvironment by Probing plan execution platform 118. A probe descriptoradded by the community or expert via the tool at 140 is interpreted bythe probing platform 118 and probing results are produced. Thus, thecommunity may provide instructions by way of probe descriptor to querythe environment for certain data. Under specific monitoring event types,predefined probes are applied to the environment to gather more detaileddata. A particular example of probe collection is the one leading to theend-to-end transaction response time decomposition into the intervalsspent at each resource involved in the transaction. This decompositioncan be used to pinpoint the resource that, due to failure, has anincreased response time compared to its normal behavior. Othermechanisms may be used in conjunction with the embodiments of thepresent invention. Based on the probe results received back from the ITenvironment, Probing plan execution platform 18 performs fartheranalysis.

A classifier module 122 receives the filtered or focused data 150 andautomatically generates the root cause label 142. The classifier module122 may be populated with patterns relevant to the problem at hand and alabel is sent to Problem Determination module 128 based on the currentproblem monitoring data 150. The classifier module 122, for example,categorize any problem a user experiences by recognizing the problemspecificity leveraging all available data such as, but not limited to,performance data, resources consumption data, logs data. As shown at146, historical labeled monitoring data 120 is used to learn thepatterns specific for an existing problem taxonomy; then problems may berecognized when given a new set of monitoring and log data based on thepatterns previously learned.

The Problem Determination processes and tools 128 are used to fix thecustomer's incident. It may take the form of a self-assist tour (e.g.,IBM Support Assistant (ISA)) or of a cycle through the technical supportprocess. The root cause label 144 is manually given or attached to aproblem by the individual who solved and closed the problem ticketopened for the customer's issue. The root cause label 144 may be storedin the labeled data repository 120. This label may be different from thetype of problem initially inferred based on the problem symptoms in theAlarms 110, 112 and 114.

A database, a file or a collection of files 120 stores the monitoringdata together with the corresponding label, i.e., the type of problemduring which that data was generated by the monitored systems. In oneembodiment, technical personnel or an operator or the like may designatea “problem label” to the monitored data manually. A learning module orprocess 146 learns from the labeled data 120 the patterns stored inproblem data patterns-classifier 122. Any known or will be knowntechniques or algorithms for detecting patterns and classifying data maybe used in conjunction with the embodiments of the present invention.

The following is a pseudo algorithm of the classifier 122:

Input M classifiers {fl(x), ..., fM(x)}, a test instance x for eachclassifier fm(x) (m = 1...M)  if fm(x) classifies x to be negative (0)  assign ym = 0  else if fm(x) classifies x to be positive (1)  ifyparent(m) = 1 then assign ym = 1 else assign ym = 0 end for Outputpredicted class label y for x

The resolution 148 to the problem at hand, either found by the user orprovided by the technical support service based on the data from theprobe platform 118 and problem label at 142 may be also provided.

The integration of the technologies presented in FIG. 1, e.g., ProblemClassification, Probes, ISA, on top of the monitoring platform, may leadto cost savings and PDR duration decrease, for example, by:

-   -   Saving labor in building and maintaining environment specific        scripts/probes;    -   Sharing the new PD plans across environments, developed out of        experiences from one account;    -   Providing the abstraction of probe knowledge for other people to        compose higher level probes; and    -   Empowering the service desk operators to execute relevant PD        plans to gather better failure targeted monitoring data. The        monitoring DATA is filtered based on the PD plans results into        DATA′ for ISA for further analysis before forwarding the problem        tickets to technical support personnel.

FIG. 2 is a flow diagram illustrating a method of the present disclosurein one embodiment. The steps described below need not be performed inthe sequential order as described below. Rather some of the steps mayoccur asynchronously or simultaneously with other steps. At 202, monitorand data collector module monitors and collects data from the computersystem or IT environment. The data may include but is not limited todata such as performance data, resources consumption data, and log data.At 204, the monitor and data collector module may raise an alarm basedon anomaly in the monitored data. In addition, at 206, a user may raisean alarm, for example, if the user using the computer system or ITenvironment detects a problem or error. At 208, the monitor and datacollector module sends the alarm and associated data that caused thealarm to a probe platform. User raised alarm and associated data mayalso be sent to the probe platform For instance, a customer may call orsend an email to the technical support service, which in turn maycollect the necessary data and send the alarm and associated data thatcaused the alarm to a probe platform.

At 210, the probe platform automatically further probes the computersystem or IT environment for additional information related to theproblem raised as an alarm. For instance, a PD plan is used and CMDB isqueried to create a customized probe for the client.

At 212, based on the probed information or results of step 210, theprobe platform filters the data received from the monitor and datacollector module at step 208 to focus the data to the problem. Forinstance, in case of problems in complex IT environment with multi-tiersystem dependencies, the same issue may cause various failures, andhence generate several notifications, at different levels of themulti-tier system. In such cases, the user would greatly benefit ofembedded mechanisms to filter the monitoring data related to the failingresource, from all the generated data in the system. An example of sucha multi-tier environment is an e-business system supported by aninfrastructure that may have the following subsystems connected by localand wide area networks: web based presentation services, accessservices, application business logic, messaging services, databaseservices and storage subsystems. A complex probe would generate testingprobes to see what works and what does not, thus informing on what tofocus the monitoring data collection.

At 214, the filtered data is sent to labeled data repository and problemdata classifier. The problem data classifier matches the filtered dataagainst the labeled patterns generated by the classification process inorder to label the received monitoring data. At 216, the problem dataclassifier automatically determines the root cause label associated withthe problem that caused the alarm to be raised at step 202.

At 218, a problem determination module may also manually determine theroot cause label by using the filtered data from the probe platform. Forinstance, a technical support service many manually find the root causeand/or ISA may guide the problem determination process by suggestingpossible tasks to perform for detecting the root cause. The determinedroot cause label may be stored in the labeled data repository for futurereferences.

At 220, the solution or fix to the problem is sent to the user. Forinstance, once the problem is identified, known solutions may be sent tothe user.

As will be appreciated by one skilled in the art, the present inventionmay be embodied as a system, method or computer program product.Accordingly, the present invention may take the form of an entirelyhardware embodiment, an entirely software embodiment (includingfirmware, resident software, micro-code, etc.) or an embodimentcombining software and hardware aspects that may all generally bereferred to herein as a “circuit,” “module” or “system.” Furthermore,the present invention may take the form of a computer program productembodied in any tangible medium of expression having computer usableprogram code embodied in the medium.

Any combination of one or more computer usable or computer readablemedium(s) may be utilized. The computer-usable or computer-readablemedium may be, for example but not limited to, an electronic, magnetic,optical, electromagnetic, infrared, or semiconductor system, apparatus,device, or propagation medium. More specific examples (a non-exhaustivelist) of the computer-readable medium would include the following: anelectrical connection having one or more wires, a portable computerdiskette, a hard disk, a random access memory (RAM), a read-only memory(ROM), an erasable programmable read-only memory (EPROM or Flashmemory), an optical fiber, a portable compact disc read-only memory(CDROM), an optical storage device, a transmission media such as thosesupporting the Internet or an intranet, or a magnetic storage device.Note that the computer-usable or computer-readable medium could even bepaper or another suitable medium, upon which the program is printed, asthe program can be electronically captured, via, for instance, opticalscanning of the paper or other medium, then compiled, interpreted, orotherwise processed in a suitable manner, if necessary, and then storedin a computer memory. In the context of this document, a computer-usableor computer-readable medium may be any medium that can contain, store,communicate, propagate, or transport the program for use by or inconnection with the instruction execution system, apparatus, or device.The computer-usable medium may include a propagated data signal with thecomputer-usable program code embodied therewith, either in baseband oras part of a carrier wave. The computer usable program code may betransmitted using any appropriate medium, including but not limited towireless, wireline, optical fiber cable, RF, etc.

Computer program code for carrying out operations of the presentinvention may be written in any combination of one or more programminglanguages, including an object oriented programming language such asJava, Smalltalk, C++ or the like and conventional procedural programminglanguages, such as the “C” programming language or similar programminglanguages. The program code may execute entirely on the user's computer,partly on the user's computer, as a stand-alone software package, partlyon the user's computer and partly on a remote computer or entirely onthe remote computer or server. In the latter scenario, the remotecomputer may be connected to the user's computer through any type ofnetwork, including a local area network (LAN) or a wide area network(WAN), or the connection may be made to an external computer (forexample, through the Internet using an Internet Service Provider).

The present invention is described with reference to flowchartillustrations and/or block diagrams of methods, apparatus (systems) andcomputer program products according to embodiments of the invention. Itwill be understood that each block of the flowchart illustrations and/orblock diagrams, and combinations of blocks in the flowchartillustrations and/or block diagrams, can be implemented by computerprogram instructions. These computer program instructions may beprovided to a processor of a general purpose computer, special purposecomputer, or other programmable data processing apparatus to produce amachine, such that the instructions, which execute via the processor ofthe computer or other programmable data processing apparatus, createmeans for implementing the functions/acts specified in the flowchartand/or block diagram block or blocks. These computer programinstructions may also be stored in a computer-readable medium that candirect a computer or other programmable data processing apparatus tofunction in a particular manner, such that the instructions stored inthe computer-readable medium produce an article of manufacture includinginstruction means which implement the function/act specified in theflowchart and/or block diagram block or blocks.

The computer program instructions may also be loaded onto a computer orother programmable data processing apparatus to cause a series ofoperational steps to be performed on the computer or other programmableapparatus to produce a computer implemented process such that theinstructions which execute on the computer or other programmableapparatus provide processes for implementing the functions/actsspecified in the flowchart and/or block diagram block or blocks.

Referring now to FIG. 3, the systems and methodologies of the presentdisclosure may be carried out or executed in a computer system thatincludes a processing unit 302, which houses one or more processorsand/or cores, memory and other systems components (not shown expresslyin the drawing) that implement a computer processing system, or computerthat may execute a computer program product. The computer programproduct may comprise media, for example a hard disk, a compact storagemedium such as a compact disc, or other storage devices, which may beread by the processing unit 302 by any techniques known or will be knownto the skilled artisan for providing the computer program product to theprocessing system for execution.

The computer program product may comprise all the respective featuresenabling the implementation of the methodology described herein, andwhich—when loaded in a computer system—is able to carry out the methods.Computer program, software program, program, or software, in the presentcontext means any expression, in any language, code or notation, of aset of instructions intended to cause a system having an informationprocessing capability to perform a particular function either directlyor after either or both of the following: (a) conversion to anotherlanguage, code or notation; and/or (b) reproduction in a differentmaterial form.

The computer processing system that carries out the system and method ofthe present disclosure may also include a display device such as amonitor or display screen 304 for presenting output displays andproviding a display through which the user may input data and interactwith the processing system, for instance, in cooperation with inputdevices such as the keyboard 306 and mouse device 308 or pointingdevice. The computer processing system may be also connected or coupledto one or more peripheral devices such as the printer 310, scanner (notshown), speaker, and any other devices, directly or via remoteconnections. The computer processing system may be connected or coupledto one or more other processing systems such as a server 310, otherremote computer processing system 314, network storage devices 312, viaany one or more of a local Ethernet, WAN connection, Internet, etc. orvia any other networking methodologies that connect different computingsystems and allow them to communicate with one another. The variousfunctionalities and modules of the systems and methods of the presentdisclosure may be implemented or carried out distributedly on differentprocessing systems (e.g., 302, 314, 318), or on any single platform, forinstance, accessing data stored locally or distributedly on the network.

The terminology used herein is for the purpose of describing particularembodiments only and is not intended to be limiting of the invention. Asused herein, the singular forms “a”, “an” and “the” are intended toinclude the plural forms as well, unless the context clearly indicatesotherwise. It will be further understood that the terms “comprises”and/or “comprising,” when used in this specification, specify thepresence of stated features, integers, steps, operations, elements,and/or components, but do not preclude the presence or addition of oneor more other features, integers, steps, operations, elements,components, and/or groups thereof.

The corresponding structures, materials, acts, and equivalents of allmeans or step plus function elements, if any, in the claims below areintended to include any structure, material, or act for performing thefunction in combination with other claimed elements as specificallyclaimed. The description of the present invention has been presented forpurposes of illustration and description, but is not intended to beexhaustive or limited to the invention in the form disclosed. Manymodifications and variations will be apparent to those of ordinary skillin the art without departing from the scope and spirit of the invention.The embodiment was chosen and described in order to best explain theprinciples of the invention and the practical application, and to enableothers of ordinary skill in the art to understand the invention forvarious embodiments with various modifications as are suited to theparticular use contemplated.

Various aspects of the present disclosure may be embodied as a program,software, or computer instructions embodied in a computer or machineusable or readable medium, which causes the computer or machine toperform the steps of the method when executed on the computer,processor, and/or machine. A program storage device readable by amachine, tangibly embodying a program of instructions executable by themachine to perform various functionalities and methods described in thepresent disclosure is also provided.

The system and method of the present disclosure may be implemented andrun on a general-purpose computer or special-purpose computer system.The computer system may be any type of known or will be known systemsand may typically include a processor, memory device, a storage device,input/output devices, internal buses, and/or a communications interfacefor communicating with other computer systems in conjunction withcommunication hardware and software, etc.

The terms “computer system” and “computer network” as may be used in thepresent application may include a variety of combinations of fixedand/or portable computer hardware, software, peripherals, and storagedevices. The computer system may include a plurality of individualcomponents that are networked or otherwise linked to performcollaboratively, or may include one or more stand-alone components. Thehardware and software components of the computer system of the presentapplication may include and may be included within fixed and portabledevices such as desktop, laptop, server. A module may be a component ofa device, software, program, or system that implements some“functionality”, which can be embodied as software, hardware, firmware,electronic circuitry, or etc.

The embodiments described above are illustrative examples and it shouldnot be construed that the present invention is limited to theseparticular embodiments. Thus, various changes and modifications may beeffected by one skilled in the art without departing from the spirit orscope of the invention as defined in the appended claims.

1. A computer-implemented method for problem determination using probecollections and problem classification for technical support services,comprising: monitoring and collecting, by a processor, data associatedwith a running computer system; raising an alarm, automatically by theprocessor, based on the monitored and collected data; probing thecomputer system for additional information associated with the alarmbased on a probe plan including composite probes and control flow, datatransformations and flow with decision control flow, stored in a sharedrepository, the shared repository being updatable by a community ofprobe developers wherein the community is enabled to provideinstructions for querying the computer system by way of one or moreprobes stored in the shared repository; filtering the monitored andcollected data based on the additional information established fromprobing, the filtering including filtering out one or more replies fromone or more components of the computer system running flawlessly andlocalizing one or more failing resources; and using the filtered data tolabel a problem associated with the raised alarm.
 2. The method of claim1, wherein the step of using the filtered data includes matching thefiltered data against labeled patterns.
 3. The method of claim 2,wherein the labeled patterns are generated using a classificationprocess.
 4. The method of claim 1, wherein the step of probing includesautomatically executing a probe of said one or more probes.
 5. Themethod of claim 4, further including using a probe descriptor and dataassociated with the computer system to instantiate the probe of said oneor more probes, wherein the probe descriptor includes a genericdescriptor and the data associated with the computer system is obtainedby querying a configuration management database storing informationrelated to a plurality of IT environments, wherein the data associatedwith the computer system is used as one or more probe parameters.
 6. Themethod of claim 1, further including manually raising an alarm.
 7. Anon-transitory program storage device readable by a machine, tangiblyembodying a program of instructions executable by the machine to performa method for problem determination using probe collections and problemclassification for technical support services, comprising: monitoringand collecting, by a processor, data associated with a running computersystem; raising an alarm, automatically by the processor, based on themonitored and collected data; probing the computer system for additionalinformation associated with the alarm based on a probe plan includingcomposite probes and control flow, data transformations and flow withdecision control flow, stored in a shared repository, the sharedrepository being updatable by a community of probe developers whereinthe community is enabled to provide instructions for querying thecomputer system by way of one or more probes stored in the sharedrepository; filtering the monitored and collected data based on theadditional information established from probing, the filtering includingfiltering out one or more replies from one or more components of thecomputer system running flawlessly and localizing one or more failingresources; and using the filtered data to label a problem associatedwith the raised alarm.
 8. The program storage device of claim 7, whereinthe step of using the filtered data includes matching the filtered dataagainst labeled patterns.
 9. The program storage device of claim 8,wherein the labeled patterns are generated using a classificationprocess.
 10. The program storage device of claim 7, wherein the step ofprobing includes automatically executing a probe of said one or moreprobes.
 11. The program storage device of claim 10, further includingusing a probe descriptor and data associated with the computer system toinstantiate the probe of said one or more probes, wherein the probedescriptor includes a generic descriptor and the data associated withthe computer system is obtained by querying a configuration managementdatabase storing information related to a plurality of IT environments,wherein the data associated with the computer system is used as one ormore probe parameters.
 12. The program storage device of claim 7,further including manually raising an alarm.
 13. A system for problemdetermination using probe collections and problem classification fortechnical support services, comprising: a processor; a monitoring anddata collection processing module operable to monitor and collect dataassociated with a computer system, the monitoring and data collectionmodule further operable to raise an alarm based on the monitored andcollected data; a probe platform operable to execute on the processorand further operable to probe the computer system for additionalinformation based on a probe plan including composite probes and controlflow, data transformations and flow with decision control flow, storedin a shared repository, the shared repository being updatable by acommunity of probe developers wherein the community is enabled toprovide instructions for querying the computer system by way of one ormore probes stored in the shared repository, the probe platform furtheroperable to filter the monitored and collected data based on theadditional information established from probing, the filtering includingfiltering out one or more replies from one or more components of thecomputer system running flawlessly and localizing one or more failingresources; and a classifier module operable to use the filtered data andautomatically label a problem associated with the raised alarm.
 14. Thesystem of claim 13, wherein the computer system includes a computersystem infrastructure having a plurality of subsystems connected bylocal and wide area networks.
 15. The system of claim 13, wherein theclassifier module is operable to match the filtered data against labeledpatterns.
 16. The system of claim 15, wherein the classifier module isoperable to generate the labeled patterns using a classificationprocess.
 17. The system of claim 13, wherein the probe platform isoperable to automatically execute a probe of said one or more probes.18. The system of claim 17, further including a probe descriptor plandatabase storing a plurality of probe descriptors.
 19. The system claim18, further including configuration management database storing dataassociated with the computer system.
 20. The system of claim 19, whereinthe probe platform is operable to instantiate the probe of said one ormore probes using one of more of the probe descriptors and theconfiguration management database data, wherein the probe descriptorincludes a generic descriptor and the data associated with the computersystem is obtained by querying the configuration management databasestoring information related to a plurality of IT environments, whereinthe data associated with the computer system is used as one or moreprobe parameters to the generic descriptor.